Stable Function Approximation in Dynamic Programming

نویسنده

  • Geoffrey J. Gordon
چکیده

The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that generalize beyond the observed data. We provide a proof of convergence for a wide class of temporal di erence methods involving function approximators such as k-nearest-neighbor, and show experimentally that these methods can be useful. The proof is based on a view of function approximators as expansion or contraction mappings. In addition, we present a novel view of tted value iteration: an approximate algorithm for one environment turns out to be an exact algorithm for a di erent environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method

In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...

متن کامل

Minimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs

This paper addresses the Tardy/Lost penalty minimization on a single machine. According to this penalty criterion, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Besides its application in real world problems, Tardy/Lost measure is a general form for popular objective functions like weighted tardiness, late work and tardiness with reje...

متن کامل

An approximation algorithm and FPTAS for Tardy/Lost minimization with common due dates on a single machine

This paper addresses the Tardy/Lost penalty minimization with common due dates on a single machine. According to this performance measure, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Initially, we present a 2-approximation algorithm and examine its worst case ratio bound. Then, a pseudo-polynomial dynamic programming algorithm is de...

متن کامل

Expected Duration of Dynamic Markov PERT Networks

Abstract : In this paper , we apply the stochastic dynamic programming to approximate the mean project completion time in dynamic Markov PERT networks. It is assumed that the activity durations are independent random variables with exponential distributions, but some social and economical problems influence the mean of activity durations. It is also assumed that the social problems evolve in ac...

متن کامل

OPTIMIZATION OF A PRODUCTION LOT SIZING PROBLEM WITH QUANTITY DISCOUNT

Dynamic lot sizing problem is one of the significant problem in industrial units and it has been considered by  many researchers. Considering the quantity discount in  purchasing cost is one of the important and practical assumptions in the field of inventory control models and it has been less focused in terms of stochastic version of dynamic lot sizing problem. In  this paper, stochastic dyn...

متن کامل

Reducing policy degradation in neuro-dynamic programming

We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995